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1.  Introduction 


Consider  the  problem  of  estimating  a  univariate  probability  density  function, 

A.  A, 

f,  using  a  sample  .  from  f.  Let  f  =  f(x,  X  .....  X  )  denote  an  estima¬ 

tor.  A  common  error  norm  is  Mean  Integrated  Square  Error,  which  is  defined  as 
follows.  Let  w(x)  be  some  nonnegative  "weight  function."  Define 
,  A  I 

(1.1)  MISE  =  EJ  [f(x)-f(x)]“w(x)  dx. 

An  estimator  which  has  been  studied  extensively  (see,  for  example,  the  survey 
by  Wertz  (1978))  is  the  kernel  estimator  which  is  defined  as  follows.  Given  a 
"kernel  function,"  K  (with  /  K(x)  dx  =  1) ,  and  a  "bandwidth,"  h>0,  let 


a  .  n  x-X 

(1.2)  *(x,h)  -  i  l  ' 

1=1 

The  "bandwidth  problem"  consists  of  specifying  h=h(n)  in  some  asymptotically 
(as  n-*°)  optimal  fashion.  Under  very  precise  assumptions  on  the  amount  of  smooth¬ 
ness  of  f,  there  are  many  results  where  h(n)  is  given  deterministically  to  asymp¬ 
totically  minimize  MISE  or  some  other  error  norm.  See,  for  example,  Rosenblatt 
( 1 9S6) ,  Parzen  (1962),  or  Watson  and  Leadbetter  (1963).  Unfortunately,  this  type 
of  result  is  virtually  useless  in  practice  because  t.be  optimal  h(n)  is  a 
function  of  the  (unknown)  smoothness  of  f.  This  may  be  seen  especially  clearly 
from  the  results  of  Stone  (1980)  who  deals  with  a  continuum  of  smoothness  classes. 
Thus  there  has  been  a  considerable  search  for  techniques  which  use  the  data  to 
specify  h. 

A  popular  technique  of  this  type  is  the  "cross-validated"  or  "pseudo-maximum- 
likelihood"  method  introduced  by  Habbema,  Hermans,  and  van  den  Broek  (1974).  This 
is  defined  as  follows.  For  j=l,...,  n  form  the  "leave  one  out"  kernel  estimator, 


(1.3) 


Then  take  to  maximize  the  "estimated  likelihood," 
a  n  A 

L.  (hi  =  n  f  (X  h)  . 
i=l  -  J 

A  recent  paper  by  Chow,  Geman  and  Wu  (1983)  contains  some  interesting  heuris- 

A  A 

tics  and  a  consistency  theorem  for  the  estimator  f(x,h^) .  Despite  these  encoura¬ 
ging  results,  this  estimator  can  be  very  poorly  behaved.  Section  2  contains 
examples  which  illustrate  some  of  the  pitfalls  that  may  be  encountered  by  this 
estimator.  That  section  also  contains  a  scries  of  heur isticall v  motivated  modi- 
fications  of  L^(h),  leading  to  the  version  that  is  seen  to  be  asymptotically  op¬ 
timal  in  the  theorems  of  section  3.  The  reader  who  is  only  interested  in  the 
form  of  the  optimal  estimator  should  skip  all  of  section  2  but  (2.11). 

Section  5  contains  some  remarks.  The  last  section  contains  the  proof  of 
the  optimal itv  theorem. 

2 .  Modification  of  cross-validation. 

A  A 

To  see  how  f(x,h^)  can  be  poorly  behaved,  consider  the  following  example. 
Suppose  the  density  f  has  cumulative  distribution  function  F  so  that  for  some 
e>0, 

FCx)  =  e"1/x  for  X€C°’£)  • 

Such  an  F  could  easily  be  constructed  to  be  infinitely  differentiable.  Let 
X^  and  X^  denote  the  first  two  order  statistics  of  Xj,  .  .  . ,  Xr.  It  follows 
from  example  1.7.3  and  Theorem  2.3.2  of  Leadbetter,  Lindgren  and  Rootten  (1983) 
that , 

r 

lim  lim  P[Xfn.-Xf..  >  - — ]  =  1  . 

p-*0  n  i-J  l  J  (iogn)“ 

But  for  compactly  supported  K  (such  as,  for  example,  the  "optimal  kernels"  of 
Epanechnikov  (1969)  or  Sacks  and  Ylvisaker  (1981)),  lf^(h)=0  unless  hscfX^)  -X(i)) 

A 

for  some  constant  c.  Thus,  the  cross-validated  h1  must  converge  to  0  slower 


than  any  algebraic  rate. 


-3- 


2 

By  the  familiar  variance  and  bias  decomposition  (see  Rosenblatt 
(1971))  the  mean  square  error  may  be  written: 

E[f  (x.h)-f(x)]2  =  O(^)  +  0(h2s)  , 

where  s  represents  the  amount  of  smoothness  that  is  assumed  on  f.  Hence,  it  is 
apparent  that  the  estimator  £(x,fi^)  can  behave  very  poorly  in  the  mean  square 
sense. 

Analogous,  though  not  so  dramatic,  examples  can  be  constructed  by  taking, 
for  k  large, 

F(x)  =  x^  for  x  e  (0,e)  , 

or  by  taking  K  no  longer  compactly  supported,  but  with  suitably  "light  tails." 

These  examples  indicate  that,  even  when  f  is  very  smooth  and  compactly  supported, 
ordinary  cross-validated  estimators  can  be  drastically  affected  by  data  points 
where  f  is  close  to  0. 

A  reasonable  way  to  eliminate  the  above  difficulty  is  the  following.  Find 
an  interval  [a,b]  on  which  f  is  known  to  be  bounded  above  0.  The  assumption  of 
the  existence  of  such  an  interval  seems  easy  for  the  practitioner  to  accept. 

Next  redefine  the  estimated  likelihood 


L2(h) 


and  take  h^  to  maximize  I^Ch).  Note  that  cross-validation  is  performed  only 
over  those  observations  which  lie  in  [a,b]. 

A  A 

The  estimator  f(x,h?)  has  been  studied  by  Hall  (1982) ,  although  he  seems  to 
have  arrived  at  it  by  considerations  different  from  the  above.  The  notation  used 


here  (different  from  that  of  Hall)  is  due  to  Peter  Bloomfield  and  will  facilitate 


the  rest  of  this  discussion.  Hall's  results  show  that,  while  the  above  patholo¬ 
gies  cause  no  problems,  this  version  of  cross-validation  still  behaves  subopti- 


-4- 


mally  with  respect  to  the  rate  of  convergence  of  mean  square  error.  It  is  in¬ 
teresting  to  note  that  the  dominant  term  in  his  expansions  depends  only  on  the 
behavior  of  f  3t  the  endpoints  of  [a,b]. 

David  Ruppert  has  suggested  the  following  heuristic  explanation  of  this  end¬ 
point  effect.  Note  that  if  f'(a)<0,  there  will  be  more  X^’s  "just  to  the  left" 
of  a  than  "just  to  the  right."  Hence  if  h  is  taken  to  be  relatively  large,  more 

A 

probability  mass  (of  the  density  f(x,h))  will  be  moved  into  the  interval  [a,b] 
which  will  thus  increase  L^fh) .  Hence  there  will  be  a  tendency  for  cross-valida¬ 
tion  to  "oversmooth"  (i.e.,  take  h  too  large).  On  the  other  hand,  if  f'(a)>0, 
then,  by  the  same  argument,  cross-validation  will  tend  to  "undersmooth"  in  order 


to  keep  as  much  probability  mass  inside  [a,b]  as  possible.  When  this  effect  is 
taken  into  account  at  both  endpoints  simultaneously,  it  is  not  surprising  that 
Hall  reports  oversmoothing  when  f ' (b) -f ' (a) >0  and  undersmoothing  when 
f ' (b) -f ' (a) <0 . 

With  this  insight,  Ruppert  has  proposed  eliminating  this  effect  in  the  fol¬ 
lowing  way.  First  for  j=l,...,  n  define 

(2.1)  pA.  =  &  f .  (x,h)  dx  . 

J  a  J 

Next  redefine  the  estimated  likelihood 


-5- 


For  these  heuristics  assume  K  is  nonnegative  and  f(x)logf(x)  is  integrable.  By 
a  Law  of  Large  Numbers, 


liogL^th)  =1  l[atb](Xj)[logf(X.,h)-logpl 


(2.3) 


jb  f(x) logf(x.h)  dx  -  plogp  . 

a 


But  now  by  Jensen's  Inequality, 


(2.4)  m.  log(£i^)  dx  4  log^  dx) 

3  P  pf(x)  P 

with  equality  if  and  only  if, 


f(x,h)  f(x)  r 

A  P  1  1 

P 


Hence 

(2.5)  f(x)logf(x,h)dx  -  plogp  <  f (x) logf (x) dx  -  plogp. 

Thus,  iT,  is  essentially  using  the  conditional  Kullback-Leibler  information  (the 

A 

left  hand  side  of  (2.4))  as  a  measure  of  how  well  f(x,h)  approximates  f(x) .  But 
this  measure  has  the  disturbing  property  that  it  fails  to  distinguish  between  f 
and  f  when  they  are  unequal  but  proportional  to  each  other. 

Peter  Bloomfield  has  suggested  overcoming  this  difficulty  by  sharpening  the 
inequalityv(2.5)  using  the  following  device.  Note  that  for  x,y>0, 

(2.6)  ylog(x/y)  <  x  -  y  , 


with  equality  only  when  x  =  y.  Hence 
plogp  -  plogp  <  p  -  p  . 

It  now  follows  from  (2.5)  that 

(2.7)  f(x) logf(x,h)dx  -  p  <  jb  f(x) logf(x)dx  -  p  , 

ci  d 

A 

with  equality  if  and  only  if  f(x)  =  f(x,h)  for  almost  all  x  *  [a,b]  .  Now  rever- 


sing  the  heuristic  argument  (2.3)  it  is  apparent  that  the  estimated  likelihood 
should  be  redefined  as 


vh) 


n 

n 

j=i 


a  -p7p 

[fjCXj.Me  J  ] 


'[a.b] 


A  A 

and  h^  taken  to  maximize  L^(h). 

Peter  Bloomfield  has  pointed  out  that  L^(h)  may  be  somewhat  simplified,  from 
the  computational  viewpoint,  in  the  following  way.  Note  that 


Pi  =  (n-1)  1  l  p(X.)  , 

’  i*j  1 

where 

b  . 

(2.8)  o(x)  =  /  i  K(^~ )dv 
ah  h 

Hence,  by  a  Strong  Law  of  Large  Numbers, 

j;i“P<'l[a,bl<Xj)pj/l’)  '  e!‘<'(-J11|a,b]<V<"‘1)’1  j  p(V/p>  ’ 

n 

=  exp(-  l  p(X  ) (n-1)  l  1  (X  )/p)  - 

4-1  1  tJ.1  l  l  i  D  J  .1 
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Thus  redefine  the  estimated  likelihood 


lr„  ,1  (XJ  -PCXJ 


*  "A  ‘r. 

L,(h)  =  n  f.(X.,h)  [a,b]  3 


Note  this  also  avoids  difficulties  about  the  fact  that  p  ir.  L^(h)  is  unknown. 

One  last  refinement  will  now  be  made.  Many  authors,  starting  with  Parzen 
(1962)  and  Watson  and  Leadbetter  (1963),  have  noticed  that  the  asymptotic  pro¬ 
perties  of  K  can  be  greatly  improved  by  allowing  K(x)  to  be  negative  for  some 
x.  The  results  of  this  paper  apply  to  either  this  type  of  kernel  or  the  non- 

A 

negative  kernels  which  guarantee  that  f  is  "range-preserving."  However  the 
proofs  in  this  paper  involve  taking  logarithms,  so  it  is  necessary  to  do  some 
truncation.  Define,  for  xeR  , 

(2.  V)  f+(x,h)  =  max(f(x,h) ,0)  , 

and  for  j=l , . . . ,  n, 

(2.  10)  f*(x,h)  =  max(fj (x,h) ,0)  . 

Now  redefine  the  estimated  likelihood 

a  n  A+  1 [a  b] "p(Xj) 

(2.1L)  L(h)  =  n  f.(X  h)  la’  J  J  e 
j  =  l  J  J 

and  take  h  to  maximize  L(h) .  It  will  be  seen  in  section  3  that  the  estimator 

f ( x , h )  has  excellent  asymptotic  properties. 

An  interesting  side  effect  of  the  above  truncation  is  the  following.  If 
for  some  h  there  is  an  X^[a,bl  for  which  fj(Xj,h)<0,  then  Hh)  =  0.  Hence, 
such  an  h  can  not  be  chosen  to  be  h.  Thus,  since 


f(x.,h)  =  ~  fjfXj.h)  *  ^K(O)  , 


A  A  A  A  , 

if  K(0)>C,  then  for  j*A,  f f X  ,h)>0.  Hence,  the  estimator  f(x,h)  has  the  property 


that  it  is  range-preserving  (i.e.:  >0)  at  each  data  point  in  [a,bj.  Oi  course 
the  experimenter  who  requires  that  f  bo  range-preserving  outside  the  interval 
[a,b]  can  guarantee  this  by  taking  K  nonnegative. 


3.  Asymptotic  Optimality  Theorems 

It  is  well  known  (see,  for  example,  Rosenblatt  (1971))  that  MISE  admits 
the  variance-bias2  expansion 

(3.1)  MISE(h)  =  n  'h  ^  (j  f  (v)w(v)dy)  (/K(u)  2du)  +  o(n  ^h  ^ )  +  s.(h), 
where  the  bias2  part  is: 

(3.2)  sf(h)  =  K  (u)f  (y-hu)du-f  (y)  ]2  w(y)dy. 

Since  the  papers  of  Rosenblatt  (1956)  and  Parzen  (1962),  expansions  similar 
to  the  above  have  been  handled  as  follows. 

Assume  K  satisfies: 

j‘K(x)dx  =  1 , 

(3.3)  /xJK(x)dx  =  0,  j=l, . ...k-l, 

/xkK(x)dx  >  0. 

Also  assume  f  has  a  bounded  k-th  derivative.  By  Taylor's  Theorem, 

(3.4)  Sj.(h)  =  h2^.’[  f  (y)  ]2w(y)dy[/u^K(u)du]/k!  +  oGi*^)  . 

Now  to  find  the  "optimal  bandwidth",  ignore  the  terms  o(n  'll  ')  and  o(h‘'  ) 

(which  are  of  lower  order,  uniformly  over  h)  in  (3.1)  and  (3.4),  and  choose 

h  to  minimize 

- 1  -  I 

(3.5)  An  h  +811“  , 

where  A  and  B  are  the  obvious  coefficients  in  (3.1)  and  (3.4). 

While  this  solution  to  the  bandwidth  problem  is  theoretically  pleasing, 
it  is  useless  in  practice  because  the  quantities  A  and  B  are  unknown.  The 


main  theorem  of  this  paper  provides  a  means  of  overcoming  this  difficult;.'.  In 
particular  it  is  seen  that  (up  to  an  additive  constant),  the  function 
-2n_1  lord  (h) 

approximates  MlSE(h)  in  the  same  way  as  does  (3.5)  and  so  the  h  that  maximizes 
L(h)  is  optimal  in  the  same  sense  as  the  traditional  "optimal  bandwidth". 

The  main  theorem  of  this  paper  also  holds  in  a  setting  more  general  than 
that  just  discussed.  in  particular,  it  is  well  known  that  if  f  has  only  a  bound 
ed  p-th  derivative  where  p<k  then 
sf(h)  =  0(h2p), 

and  the  optimal  (at  least  in  the  sense  of  exponent  of  al  _'uic  convergence)  h 
may  be  found  by  minimizing 

U(n_1h“l)  +  0(h2p)  . 

This  is  perhaps  most  clearly  seen  in  the  results  of  Stone  (1980).  It  is 
also  well  known  that  p  need  not  be  an  integer  by  either  using  Sobolev  space 
methods  or  using  I.ipschitz  conditions  on  derivatives.  This  setting  is  more 
difficult  to  handle  than  the  above  because  there,  one  knows  the  optimal  h  is  of 
the  form 

-  f  2  k- 1 )  “ 1 
cn 

and  only  c  need  be  optimized,  while  here  the  exponent  is  also  unknown. 

In  the  closely  related  setting  ol  nonparametr to  regression  estimation. 

Stone  (1982)  has  posed  the  problem  (see  his  Question  3)  of  finding  an  optimal 
bandwidth  when  p  is  unknown.  The  theorem  of  this  paper  provides  a  solution 
to  this  problem,  in  the  above  sense,  by  showing  that, 

M.n)  -2n_1logL(h)  =  2R  +  MISE(h)  +  o  (MISE(h)), 

P 

uniform!',  over  h,  where  the  constant  K  is  independent  of  h  and  is  given  by: 

- 1  ” 

R  =  p-n  /  1  . 


(  3.7) 


,  (X.)logf(X.)  . 


The  reason  that  the  nonsLnndard  notation  s^(h)  ( see  (3.4))  has  been  intro¬ 
duced  is  that  it  provides  a  powerful  analvtic  tool.  In  the  sett  ini'  of  51-  k,  the 
usual  Taylor  expansion  techniques  are  useless  for  showing  results  like  (3.6) 
because  they  onlv  provide  an  upper  bound  on  s;  (h).  Thus  tlu>  quantile  s  .(h)  it¬ 
self  is  used  everywhere  in  the  proot .  Another  interesting  role  of  (h)  is  that 
its  tail  behavior  (as  h"0)  provides  a  measure  of  what  is  usual! v  railed  "smooth¬ 
ness"  of  fwhieh  is  more  precise  than  the  traditional  Lipsehita.  eoiiditions  on 
derivatives  or  indices  of  Sobolev  Spaces. 

The  main  theorem  of  this  paper  will  now  be  stated  formally.  First  a  very 
mild  restriction  will  be  placed  on  the  bandwidth  h.  For  some  small  a  >  0,  de¬ 
fine  the  sequences  Jh  ;  and  ih  )  bv 

— n  n 

(3.8)  h  =  n  '  and  h  =  n  , 

where  here  and  below'  the  dependence1  on  n  is  suppressed.  It  will  also  be  assumed 
that  the  density  f  satisfies: 

(3.9)  f  is  bounded  above  0  on  [a,b] 

( 3 . 10)  there  are  constants  M,  ;  >  0  so  that  for  nil  x,y 

f ( x ) - f ( y )  _  M  x-y 

Another  assumption  is  that  the  kernel  function  K  satisfies: 

(  i.  !  !  )  /K ( x ) d x  =  1 , 

(1.12)  K  is  compactly  .-.upportod, 

(  1.13)  There  are  constants  M,  .  >  ')  .0  th.it  !  or  all  x,v 

K(x)-K(y>  _  M  v-v 

Finally  it  will  he  assumed  that  tile  w»  ight  :  un<  Lion  in  ( i  .  1  )  is  given  bv 


1  )  .  1  i  )  w  (  x  1 


<  >: )  i  f  1  x  1  . 
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Theorem  l:  Under  the  assumpt ions  (3.8)  -  (3.14),  given  c  >  0 


.  I  -2n  ‘loctOij-R-MISEUOl  .  !  . 

T  ,  ,7r,  '  - Mise(h> -  "  ’  0 

n  K'  h-  [h,h]  !  [ 


A  disturbing  feature  of  this  theorem  is  that  it  only  applies  to  h  in  the 
vanishingly  small  interval  [lt,h]  .  The  above  computations  show  that  (l>v  (3.10)) 

(3.15)  sf(h)  =  0(h2y) , 

and  thus  the  optimal  bandwidth  is  easily  inside  the  interval  for  :  sufficiently 
small.  Also  Monte  Carlo  experience  with  L(h)  (see  Bloomfield  and  Marron  (1984)) 
indicates  this  assumption  is  not  a  problem  in  practice.  Further  reassurance 
along  these  lines  is  provided  by 

Theorem  2:  Under  (3.8)  -  (3.14),  i_f  h  =  h(n)  denotes  any  sequence  of  maxima 

»N 

of  L(h) ,  Lhen 
i)  h  -*  0  a  .  s  . 

ii)  lim  Tim  P[h<cn  ]  =  0  . 

c-’-O  n 

It  should  be  noted  that  while  Theorem  2  does  show  h  >  ji  (for  6  sufficiently 
small)  it  does  not  show  h  <  h  or  even  establish  the  consistency  of  f(x,h).  It 
is  intended  only  to  give  some  backing  to  the  above  remarks.  To  save  space,  the  nrool 
of  Theorem  2  will  not  be  given  here.  The  interested  reader  can  find  it  in 
the  technical  report  Marron  (1983).  The  proof  of  i)  is  based  on  techniques  of 
Chow,  Geman  and  Wu  (1983)  and  it  appears  that  these  techniques  may  be  further 
extended  to  establish  the  consistency  of  f(x,h).  The  proof  of  (ii)  is  based 
on  an  order  statistics  result  of  Cheng  (1983). 

4.  Remarks 

Remark  4 . 1  The  reader  may  be  surprised  that  the  "vanishing  moment"  assump¬ 
tion  (3.3)  is  not  used  in  theorem  l.  That  theorem  says  f(x,h)  will  have  the 


best  MISK  that  is  possible  for  the  given  K,  but  how  good  that  is  is  irrelevant 
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te  the  theorem.  Of  course  cue  should  choose  a  reasonably  good  K. 


G 


Remark  4.2.  The  fact  that  optimality  is  achieved  only  for  a  particular 
weight  function  should  not  be  too  disappointing.  The  one  used  here  is  quite 
natural  because  MISE  is  proportional  to  the  expected  relative  square  error: 

Et(f(Xf(xr')2iXe[a>b]]  • 

It  is  seen  in  Marron  (1982)  that  this  error  norm  is  precisely  the  one  required 
for  the  application  of  density  estimation  to  the  classification  problem.  It  may 
be  seen  without  too  much  effort  that  the  indicator  function  in  (2.11)  may  be 
replaced  by  any  bounded,  measurable  nonnegative  function  q(x) ,  which  is  supported 
inside  [a,b] ,  and  the  theorem  will  still  be  true  with 

w(x)  =  q(x)/f(x)  . 


Remark  4.3.  At  first  glance  one  might  be  disturbed  by  the  fact  that  the 
MISE  that  is  minimized  here  is  limited  to  the  interval  [a,b].  In  somewhat  simi¬ 
lar  settings,  in  the  case  of  estimating  a  regression  function,  Gasser  and  Muller 
(1979)  and  Rice  and  Rosenblatt  (1983)  have  observed  that  such  a  MISE  is  strongly 
affected  by  the  behavior  of  the  unknown  function  at  the  endpoints  and  hence  the 
bandwidth  which  minimizes  MISE  can  provide  relatively  poor  estimates  in  the  in¬ 
terior  of  [a,bj.  However,  with  very  little  effort,  one  may  see  that  such  an 
"endpoint  effect"  does  not  occur  in  the  present  setting.  This  is  because  the 
density  f  extends  (and  is  smooth)  outside  the  interval  [a,b]  and  observations 
outside  [a , b]  are  employed  in  the  estimator  of  this  paper.  Hence,  the  MISE  of 
this  paper  provides  a  very  reasonable  error  criterion. 

Remark  9.4  As  with  any  asymptotic  theory,  it  still  remains  to  check  that 
the  properties  described  by  the  asymptotics  "take  effect"  for  sample  sizes  which 
are  not  prohibitively  large.  Preliminary  computations  (for  the  paper  Bloomfield 


I 
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and  Marron  (1984))  seem  to  validate  theorem  1  and  the  heuristics  of  section  2. 


5.  Proof  of  Theorem  1. 

This  proof  uses  techniques  developed  in  Hall  (1982).  It  will  he  useful  to 

define,  for  j=l,...,n 

f.(X.,h)  -  f(X.)  f+(X.,h)  -  f(X.) 

‘  ’  “1  f(x.)  ’■‘J  f(X.) 

3  J 


By  Lemma  1  of  Hardle  and  Marron  (1984),  letting  sup  and  sjap  denote  supremum 
over  x,.[a,b]  and  he[h,h]  respectively, 

sup  S|ip  f+(x,h)  -  f(x)  |  <  sup  s^ip  |  f(x,h)  -  f(x)  ->  0  , 


in  probability.  But  by  (1.2),  (1.3),  (3.11)  and  (3.13)  letting  sup  denote  supre¬ 
mum  over  j=l . . 


sup  sup  sup  nhjf.(x.h)  -  f(x,h)[  = 

1  x  “  J 

_.  x-X.  x-X. 

=  sup  sup  s^p  J  (n-l)  l  K(-jj-^)  -  K(-^-J-)|< 
1  -  i^j 

<  2  sup  | K(u) j  . 
u  ,IR 


Hence,  by  (3.9),  vising  the  notation  //(A)  to  mean  cardinality  of  the  set 
A  =  { j=l , . . . ,n  :  X  e[a,b] ; , 

note  that 

sup  sup  If..!  <  sup  sup1.!.1  -*■  0  , 
h  i  :A  .1  h  jfA  J 

in  probability.  Now  for  n=l,2,...  define  the  event 

1’  =  '  ....  =  f..  for  each  he[h,h]  and  j !.A ;  . 

n  j  j  - 

It  follows  f rom  the  above  that 

lim  P[U  1  =  1  . 
n 

n-*" 

From  (2.11),  (3.7)  and  the  above  it  follows  that,  for  h  [_h » h  J,  on  the  event  l'  , 


V%-  ^ 


-1  '  -1  “ 

n  logL(h)  +  R  =  n  V  [l  (X . ) Log( 1+A. )- , (X . )+p] 

j=l  ia»DJ  J  J  J 


-  n‘1j1ll|a,b)<Xj,S-'^2+rj)  -  C<V  +  Pl 

■  "'l  I  [1la  !,,«,)  y. a  Hp]  -  hn-'l  +  nll  r  , 
j  =  l  13,1,1  3  J  3  jcA  J  j  t-A  J 


where  r.  denotes  the  error  term  of  the  Taylor  expansion  of  log(l+x) 
The  remainder  of  this  proof  will  be  split  into  two  lemmas: 
l.emma  A :  Given  c  >  0 , 

r  i  ■'liiub](V‘y'(vtpl  i 

lim  Pjsup  |  — “ -  >  £  !  =  0  . 

n-«  I  h  MISE(h)  | 


Lemma  B:  Given  t  >  0 


>  ''  .  =  0 


.  n~ 1  £  -  MISE(h) 

lim  P  } sup  — '3-~— - 

n-K«  I  h  MlSE(h) 


It  is  enough  to  establish  these  because  from  Lemma  B  it  follows  that,  for  t  >  G, 

*1 

1 im  P  : sun  n  ^  ^  r .  >  •  =  0  . 

n~*'  Lh  j  )CA  1 

1  MISE(h)  I 

The  proof  of  Lemma  A  is  quite  similar  in  spirit  to  that  of  Lemma  B.  Some 
details  are  different  but  these  are  very  similar  to  the  proof  of  Lemma  2a  in 
Hardle  and  Marron  (1984).  Hence,  this  proof  is  omitted. 

Proof  o f  Lemma  B : 

irst,  for  n=l,2,...  partition  the  interval  [ti,h]  by  the  following  means. 

!  or  =0,1,...  del  ini' 

.  ,  L- :  -V,v -1 

h  =  ( n  -  •  n  ) 


1  he ti  find  I.  so  that 
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\-l  "  h  -  \ 


and  redefine  h  to  be  h.  Note  that  the  dependence  of  h.  and  L  on  n  has  been 
L  * 

suppressed.  Note  also  that 

■1  . -1  i  .  -3/y 


I  h£  -  hf+li  *  11 


(5.2) 

and  that,  as  n  -*■ 

(5.3) 

It  will  be  convenient  to  adopt  the  shorthand: 


,  ,  1+3/"). 

L  =  o(n  ) . 


(5.4) 


r_,n 

A(h)  =  n  £  1 


,  u1(X.)A2  -  MISE(h)  MISE(h)  1 

[a.b]  33 

J_i  _1 


The  idea  of  this  proof  is  to  show  that  A(h)  converges  uniformly  over  the  "grid 
points",  h  ,  and  then  to  "fill  in  the  gap^' with  Lipsehitz  continuity.  More  form¬ 


ally,  for  t:  >  0,  note  that 


P[supjA(h)j  >  e]  <  I  +  II  , 
h 

where  the  behavior  at  the  grid  points  is  controlled  by 

I  =  P [ si.tp  |  A(h r )  |  >  c./2] 

i  *• 

(where  syp  denotes  supremum  over  2=1 , . . . ,L) ,  and  the  behavior  between  gridpoints 
is  controlled  by 

II  =  P [ sup ' A(h)  -  A(h  ) |  >  : 12} 

■'  ,  h 

(where  suj}  denotes  supremum  over  2=1,..., L  and  hc[h  ^,h,]).  The  proof  of  Lemma 
B  will  be  complete  when  the  following  lemmas  are  established. 

Lemma  B1 : 

I  0. 

Lemma  B2: 

II  ►  0. 


h 


i 

'  1 


i 


i. 


Proof  of  Lemma  Bl: 
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Note  that  by  an  obvious  extension  of  the  Chebychev  Inequality,  for  M=2,4,6,. 


sup | A(h  )|  >  c/2 


L  t- 


l  P 


i=l  L 


|A(h,)|  '  c/2 


<  L  sup  P[ | A(h0 ) |  >  c/2]  < 

i  1 

<  L  sup  E(2A(h,)/e)^  . 

a  X' 

Thus,  by  (5.3),  it  is  enough  to  show  that 

sup  n^+^^E(A(h)  -*■  0, 

h 

for  M  sufficiently  large. 

Next  observe  that,  by  computations  similiar  to  those  leading  to  (3.1), 


E(A(h) )  =  [  E([f  .  (X .  ,h)  -  f(X.)]2f(X.)~2l  ,(X.))  -  MISE]  MISE  1  = 

3  3  3  3  la>oJ  3 


=  [E/[ f ^ (x,h)  -  f(x)]2w(x)dx  -  MISE]  MISE-1  = 

=  [(n-l)_1h_l(/f(y)w(y)dy)(/K(u)2du)  +  o(n_1h_1)  +  sf (h)-MISE]MISE 

and  so 

sup  E(A(h))  =  0(n  *)  . 
h 


Now  using  a  cumulant  expansion  (see,  for  example,  (3.33)  of  Kendall  and  Stuart 
(1963))  of  the  M-th  centered  moment  of  A(h) ,  to  complete  the  proof  of  Lemma  B1 
it  is  enough  to  show  that,  for  M  sufficiently  large,  for  m=2,...,M, 

sup  cum  (A(h)  , . . .  ,A(h)  )  =  o(n^+2|^m^) 
h  m 

where  cum  denotes  the  m-th  order  cumulant. 
m 

Observe  that  from  (L.3),  (5.1)  and  (5.4)  , 

2 


i  - 1 

A(h)  =  n 


' .  I 


x.-x. 

(r.-l )“  X  17  — -)-f(X.) 

f (X  .  )~2 1 r  ,  , (X . ) -MISE 
!  a.b  i 

MISE 


where,  by  (3.14), 

1^1  X~X-  "1  i  u  _i . 

(5.5)  V..  =  (n-1)  f  K(J — -)  -  f(X.)  f(X.)  '2w(X .  )'*MISE  2 

J-J  h  h  J  J  3 


S  ,  by  using  linearity  properties  of  cumulants  (see,  for  example  (iv)  and  (v)  of 
theorem  23.1  in  Brillinger  (1979))  the  proof  of  Lemma  B1  will  be  complete  when  it 
is  seen  that,  for  M  sufficiently  large,  m=2,...,M, 


(5.6)  sup 

h 


n~mY)l  cum  (V.  .  V.,  .  .  V.,  .  ) 

("rr,  m  i. j.  i'.j.  i  j  v 1  i 

jn  11  11  mJm  mJm 


=  o(n 


-( 1+3/y  )m/M. 


where  £  denotes  summation  over  =  l»«««»n,  and  where  £  denotes  summation 

j  i 

over  i,,...,i  =  l,...,n  subject  to  the  restrictions  i.  i  jM  ,  and  where 

1  m  11mm 

>  denotes  summation  over  i, i'  =  l,...,n  subject  to  the  restrictions 
t  ,  L  m 

t 

i'^j  , . . . ,i'/jm  . 

II  mm 

By  another  of  the  properties :of  cumulants,  note  that  many  of  the  terms  in  the 
summation  (5.6)  will  be  0  because  of  the  independence  of  X^,.,.,X  .  The  nonzero 
terms  will  be  handled  by  grouping  them  according  to  pattern  of  indices  and  pro¬ 
ceeding  casewise. 

First  note  that  by  the  usual  moment  expansion  of  cumulants  (see,  for  example, 

(3.39)  of  Kendall  and  Stuart  (1963)),  each  cum  may  be  expanded  into  a  linear 

combination,  the  first  term  of  which  is 

(5.7)  E [ V .  .  V. , .  • • -V.  .  V. , .  ]  , 

J ,  i  J_  i 

11  11  mm  mm 

and  the  remaining  terms  of  which  are  multiples  of  products  of  moments  of  all  the 
various  partitions  of 


.  V 
m^m 


Next  a  means  of  counting  the  nonzero  terms  in  (5.6)  will  be  developed. 

Since  special  attention  must  be  paid  to  duplications  among  ij,...,i^, 

and  j | '  ,  the  following  relabeling  of  the  indices  will  be  made: 

(i)  Suppose  that  r  is  the  number  of  that  are  distinct.  Relabel 

these  indices  (each  time  they  occur)  by 

(ii)  Suppose  that  the  number  of  i,,...,i  .  i ! . i'  that  are  the  same  as 

v  1  m  1  m 

one  of  is  s.  Each  of  these  will  now  be  denoted  by  the  appro- 

1  ~  r 

pr iate  j . 

(iii)  Suppose  that,  out  of  what  remain  of  i. . i  ,  i' . 1'  that  of  them  are 

1  m  1  m 

distinct.  Denote  these  (each  time  they  occur)  by  i^ . it-  Also  let  s^ 

denote  the  number  of  times  the  new  i.  appears,  and  similarly  for  s„,...,s  . 

1  dL  L 

Now  by  rearranging  the  Vs,  note  that  (5.7)  may  be  rewritten  as 
E l V . . ' s  V .  . 's- • -V.  . ’s]  , 

V  V 


where  "V  ' s"  denotes  the  product  of  all  V's  whose  first  index  is  one  of  ]. . .  , 

.).)  1  r 

and  where  "V.  ,'s"  denotes  the  product  of  all  V's  whose  first  index  is  i.,  etc. 

1 


Since  there  seems  to  be  no  chance  of  confusion,  both  notations  will  be  used  in 
the  following. 

Now  group  the  nonzero  cumulant  in  (5.6)  according  to  the  pattern  of  duplica¬ 


tion  of  indices  (eg:  cum^  C  V  ^  ^ v  1 7  [  4^  in  t^e  sam0  £rouP  as  c  um  ^(V,.  3V^3»V^j  V^j ) ) 


Note  that  the  number  of  cumulants  falling  into  each  group  is  (as  n  -»  " )  of  the 
r+t 


order  0(n  ).  So  to  verify  (5.6),  it  is  enough  to  show  that,  for  each  duplica¬ 

tion  group, 


r+t-m  ,  .  -(  1  +  J/ylm/M. 

sup  n  'cum  (  )  =  o(n  .)  , 

h 


m 


where  M  is  sufficiently  large,  and  m=2 , 3 , . . . ,M. 
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J  =  IX.  - - X.  j  , 


and  let  E[*'J]  denote  the  usual  conditional  (on  X . ,X  )  expectation  opera- 

J1  Jt 

tor.  Now  letting  B  denote  a  generic  constant,  by  (3.9),  (3.13),  (5.5)  and  in¬ 
tegration  by  substitution, 


nr+t  m  j  £(v  »  s  v.  .  ’  s  •  *  •  V .  .’s)j  =  nr+t"m|t:(V.  ,’s  i:[V.  .  ’  s  I .!  J  •  •  •  E  l  V .  ,'s|j]) 

JJ  x,J  xtJ  JJ  itJ 


r+t-m , 


-s  -s  S1  (til  !) 

r+t-m  n  h  n  n 


_st  -(vl) 

n  h 


(5.8) 


=  Bn-(3m-r-t)h-(2m-t)MISE-m  =  B  Unh^ISE^]  (nh)~  W  . 

While  this  bound  is  sufficient  to  handle  many  of  the  patterns  of  duplication  of 

indices,  refined  computations  of  several  types  are  required  for  others. 

To  see  what  cases  are  necessary,  note  that  in  (5.6),  the  cum  (  )  are  non- 

m 

zero  only  when  no  subset  of  the  arguments  of  cura^  is  independent  of  the  remaining 
arguments  (see,  for  example,  (iii)  in  Theorem  2.3.1  of  Brillinger  (1979)).  In 
otiier  words,  there  must  be  at  least  m-1  pairs  of  arguments  of  cum^t  )  which 
have  an  index  in  common.  Thus,  for  each  nonzero  cum^f  )>  the  following  counting 
argument  is  valid: 

m-1  £  (/(pairs  with  a  common  index)  s 

_  //(pairs  with  an  i  in  common)  +  /-‘(pairs  with  a  j  in  common)  < 

(5.9)  <  [*(i's  available)  -  it  (distinct  i's)]  + 

+  [//(V.j's)  +  #(j..  available)  -  "(distinct  j's)] 
1  l(2m-s)-t]  +  [s+m-r]  =  3m-t-r. 

It  follows  from  this  that 

(5.10)  2m-t-r  -  1  . 

The  bound  (5.8)  will  now  be  either  used  or  refined  in  a  casewisc  manner. 

Case  1:  m-r  >  m/12  and  2m-r-t  >  U  . 
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It  follows  from  (3.1)  that 

(3.11)  sup  (nh)-1MISE(h)-1  =0(1). 
h 

Thus  from  (3.8)  and  (5.8) 

r+t-mi,..,,  i  ,  .  ■  ^.-r-m/12,  ,  -jm/12. 

sup  n  L(V .  .  s  V.  .  s  •  •  ‘  V .  ,’s)l  =  0(h  )  =  (Hn  )  . 

h  n  tjJ  itJ 

But  similar  computations  show  that  the  same  bound  may  be  obtained  for  the  other 


products  of  moments  appearing  in  cum  (  ) .  Thus 

'  m 

sup  nr  t  m|cum  (V.  .  V.t.  ,  ...,V.  .  V.,.  )|  =  0(n 

v.  m  l .  j  ,  1,1.  i  j  l  j 

"  11  1J1  mJm  mJm 


■Pm/ 1 2 


)  =  o(n 


-(l+3/r)m/M, 


by  taking  M  sufficiently  large. 


Case  2:  m-r  2  m/12 


2m-r-t  =  -1  . 


Here  the  basic  bound  (5.8)  needs  some  modification.  Since  r  <  m,  note  that 


m-t  =  r-m-1  ■  0 


and  hence 


t  >  m. 


Thus  at  least  two  of  s^,...,st  must  be  equal  to  1.  Now  relabel  ij,...,i  so  that 

s^_j  and  s^  are  both  1.  The  bound  (5.8)  may  now  be  modified  to  give 

r+t-rn  E(V. . ’s  V.  . ’s*  •  -V  .  *s)  1  • 
n  11  1^1  i  t-l 


“  S .  —  s 

r+t-m  n  h 


-8i,-(sr1)  _st-2 -(st-2-l) 

n  !i  n  h 


M1SFS^ 

MISE  1 


A-2/Z 


•  E,E[V  .  | .!  ]  E  [  V 

t-lJ  1tJ 


But  f rom  (5.5) 


r  x.-x 

E[V11|.J]  =  (n-l)'1|j^K(i — 3f(x)dx  -  f(X  J  |f(X  J_?iw(X.)',!MlSE~li 


Thus,  bv  (3.2)  and  integration  by  substitution, 


(5.12)  E(EIU  Jj])  =  (n-l)  sf(h)MISE(h)  , 
and  so,  by  the  Schwartz  Inequality, 


E  E[V.  .  '.1]E[V.  |J]  ;  <  (n-l)  1  sr(h)MlSE(h)  1  . 

lt-lJ  V  "  ‘ 


Hence,  from  (3.1) 

r+t-m i 

sup  n 
h 


E(V. . 's  V.  . 's- * -V.  . ’s) | 
JJ  itJ 


(5.13) 


<  sup  B[(nh)  (m  1)s.(h)MISE(h)'m](nh)  (2m  r  L+1)h" 


2  0(hm/12)  =  0(n-5m/12)  . 

But  similar  computations  show  that  the  same  bound  may  be  obtained  for 

products  of  moments  appearing  in  cum  (  ).  Thus 

m 

sup  n  , cum  (V.  .  ,...,V.  .  V.,.  )(  =  0(n  ) 

h  m  i.  ].  l.j.  xj  l  2 

n  11  11  mm  mm 

by  taking  M  sufficiently  large. 

Case  3:  m-r  <  m/12  and  2m-r-t  _  m/12  . 

It  follows  from  (3.8)  that 

-1  - c 

(nh)  =  n 

Thus,  since  r  _  m,  bv  (5.8)  and  (3.11) 


spp  n 
h 


r+t-m  K  ( V  .  .  '  s  V  .'s)|  =0(n”'m/12)  . 


1J  V  V 


Hi 'nee,  as  above. 


sup  n 


r+L  ^  |  cum  (V.  .  V.,.  ,...,V.  .  V . , .  ) I  =  0 ( n  =  0(n' 


m  1 1  1 ,  V! 


l  J  l  j 
m  m  mm 


for  M  sufficientlv  large. 

Case  4:  m-r  •  m/12,  0  <  2m-r-t  •  m/12,  and  s  >  m/3  . 


the  other 


,  -(1  +  3/y )m/M 

o  (n 


(1+3/Om/M 


For  this  case,  consider  the  factors  E[V..'s|j)  appearing  in  the  computation 


3.8)  . 


It  will  ho  convenient  to  apply  the  name  "singleton"  to  those  lor  which 


the  corresponding  s.=l.  Note  that 

t  -  '•>  (.singletons)  <  ( i '  s  available)  -  •'•‘(places  to  put  i's)  =  (2m-s)-t 

Thus , 

t  r> .  1  -* )  !U  single  tons.)  >  2(t-m)+s  >  2(m-r-m/l2)  +  s  -2m/ 1  2  +  m/3  =  m/6. 

Now  it  is  desired  to  use  the  computation  (5.12)  to  generate  extra  factors  of 
st.(h)  from  the  above  singletons.  To  do  this ,  for  each  of  at  most  2 

s  ing  1  e  tons  liaving  that  particular  j  may  be  employed.  Let  u  count  the  number  of 
singletons  that  may  be  used.  Since 

r  =  <f  (distinct  j's)  N  11m/ 12  , 

note  that 

(5.13)  u  _  m / 6  -  m/12  =  m/12. 

Now  relabel  i.,...,i  so  that  the  above  singletons  are  indexed  bv  i  i  . 

It  °  -  t-u+1  t 

Note  that  the  computation  (5.8)  may  be  refined  to  give 


nr+t  m,E(V. . 's  V.  . 's'  * -V.  .  ’s) |  < 

1.1  ijJ  ttJ  1  " 


-s.-s  ~Slh  (S1_1) 
r-f-t-m  n  h  n  h  _ 

MISES^ 

MISE  1 


-s  -(s  -1) 

t-u,  t-u 
n _ h _ 

s  /  2 
t-u 


'  E ( E | V  .  j  J ] • • ■ E [ V  |J])S 

t-u+1 ]  1tJ 

.  Bn-('3m-r-t)ir(2m-t)sf(h)u/2MISE-m  < 

B[  (nh)-mMISifm]  (nh)_(2m_r"t:)h-(m_r)sf  (b)m//24  . 

But  now,  from  (3.8)  and  (3.15),  as  above 

sup  nr+t'm,E(V..,s  V,  . ' s  * • • V .  ,'s)|  =  0( (h2>)m/24)  =0(n~6Ym/12)  . 


-  2  j  - 


Thus,  as  above, 

sup  cum  ( V .  .  V  .  ,  .  ,  .  .  .  ,  V .  .  V  .  ,  .  ) 

h  ni  1  j  1  j ,  i  :  i  J 

n  1  1  11  m  m  n  m 

for  M  sufficiently  Large. 

Case  •>:  m-r  <  m/12,  2m-r-t  =  -1  ,  and  s  2  ra/ 3 . 


i m/  1 2 ,  _  ■ -(  l+3/-f)ra/M 

‘Hn  )  =  o(n  )  , 


This  ease  is  an  extension  of  Case  4  in  the  same  way  that  Case  2  extends  Case 
1.  Note  that  in  the  present  case,  the  computation  (5.14)  can  be  improved  Lo 
•"(singletons)  2  2(m-r+l  )+s  _  2+m/3  . 

Thus  (5.13)  can  be  improved  to 
u  >  2+m/4  . 


1'he  extra  two  singletons  are  used  to  generate  an  extra  s^(h)  which  is  used  as  in 
15.13).  The  result  is: 


sup  ■  cum  (V .  .  V . , .  , . . . , V .  .  V . ,  .  ) I  =  0  (n 

h  m  xiJi 


•3ym/ 4 


)  =  o(n~(1+]/Y)m/M), 


1J1  1 

for  M  sufficiently  large. 


nr  m  m  m 


Case  b:  m-r  <  m/ 1 2 ,  2m-r-t  2  0,  and  s  •  m/3  . 


First  recall  that  cum^l  )  is  nonzero  only  if  at  least  (m-1)  pairs  of  argu¬ 


ments  o;  cum  (  )  have  an  index  in  common.  Let.  v  denote  the  number  of  such  pairs 

m  1 


which  have  an  i  in  common,  but  different  j's.  The  counting  argument  (5.9)  may 
be  modified  to  give 

m-1  <  v  +  "(pairs  where  common  index  is  a  j)  <  v  +  [s+m-rj. 

Thus , 

(5.16)  v  _  m-1  -  fs+m-r]  2  m-1  -  (m/3  +  m/ ! 2 ]  =  7m/12  -  1. 

Note  that  a  "pair  with  an  i  in  common,  but  different  j's  "  arises  from 


havinr,  factors  and  V  _  ,  (lor  some  j^j').  Now  given  ,  .  .  .  ,X^  ,  define  the 


random  variable  7.  bv 


7.  =  •'(such  pairs  with  jx.-X  ,  '  2  2^  1>)  > 


where  K  denotes  the  length  of  the  compact  support  of  the  kernel  function  K. 


» 

'  I, 


-3 


-l 


Note  that ,  tor  z=0 , . . . , v , 


P  [  j  =  O(h')  . 


Also  note  that  if  one  of  the  above  pairs  comes  from  V.  .  and  V.  .  (for  example) 


Vl  1 1  !  2 


and  i!  X  ,  -  X. ,  >  2K  h,  then 

-I  j2 

‘ii  (i2  qr  q.  qr 

EfV.  .'s  Jl  =  E[V.  .  V.  .  •••V.  ,J]  =  / EfV.  (x)  ---V.  (x)  ' Jj  f (x)dx 

Ll->  Vl  Vd  Vr  -]r 

where  q  ,  q0  ■  0,  and  where 


V  (x)  =  (n-1)'1  |I  K(-^jp)  -  f(X.)|  f(X  )  ^w(XJ)SlISF.  '2  . 


From  which  it  follows  that 


q,  ir 

]  E  [  V  .  .  ’  s  !  J  ]  s  I  E  r  v  .  (x)  •••V.  (x)  |JJ|  f  (x)dx  = 


dx  +  /  dx  + 

(x: ! x-X .  j  <K*h }  ix:  Ix-X.  hK*h} 


"'■'x:  !  x-X  .  |  >K*h  and  |  x-X  .  !''K*h] 


s I  l»2  .  1 1  "2 

<  h  /dx  +  h  *"  .  dx  +  h  /dx  <  Bh 


s  (st  1) 
n  h 


Bv  similar  computations,  for  each  of  the  above  pairs  V..  and  V. on  the  event 

13  i  J 

X  —  X  , t  >  2K*h;  at  least  one  of  the  E [ V . . ' s . X . ' s]  allows  the  factoring  out  of 
1  ,1  ij  1  J 

an  additional  power  of  h.  Thus  when  Z =z ,  an  additional  h  ‘  may  be  used  in  the 
basic  bound  (5.8).  Letting  E(  ;Z =7)  denote  expectation  only  over  the  event 
Z=z  ,  (5.8)  may  be  modified  to 


Ie(V..'s  V.  . 's- • -V.  . ’s) !  2 


.11  ij. 


^  it  "t*  t.  ”  m  .  _ .  1  r ,  1  . .  ■  r,  .  1 

>  n  ;F. (V .  .  s  V.  .  s*'*V.  .  s;  Z=z)  . 


-2  5- 


where 


N(h)  = 


=  n-1  l  [f  (X  ,h) 

j=l  J  3 


f(X.)]2f  1 (X . ) w(X . ) 
J  J  .1 


Note  that  tor  Jl=l , . . .  ,L  and  he  [h^ 

| N (h) -N (h , ) |  j N(h  . ) I  |MISE(h,)-MISE(h) | 

!  A(h)  -  A(h„)  |  <  - ~  +  - -  •  - - - 


MISE(h) 


MISE(h£) 


MlSE(h) 


Thus,  by  Lemma  Bl,  the  proof  of  Lemma  B2  will  be  complete  when  it  is  seen  that 

(5.17)  sup  |N(h)-N(h„)  iMISE(h)-1  -*•  0 
i,h 

in  probability,  and  that 


(5.18)  sup  |MISE(h„)-MISE(h) |MISE(h)_1  ->  0. 

.e,h 


2  ,  2 


To  verify  (5.17),  note  that  by  (1.3)  and  the  algebraic  identity  a  -b  =(a-b) (a+b) , 

,  n 


j N (h) -N (h  , ) !  <  n  ^  | (f  .  (X. ,h)-f . (X  ,h  ))  • 
v  j=l  J  J  J  J  ^ 


•  (f  (X.,h)+f  (X  ,h  )-2f(X.))f  L(X.)W(X.)|  < 
J  J  JJ*'  J  J  J 

.  n  .  .  X.-X.  .  X.-X. 

u-n-'  y  - 

J=1  ifj  2  5 


•  (S“P  f.(x,h)+S^P  f  (x,h„)  +  2f(X.))f_1(X.)w(X.)  . 
x  .1  x  J  *■  .1  J  J 

But  now  by  (1.3),  (3.8)  and  (3.13),  for  B  a  generic  constant 

sup  sup  f.(x,h)  £  sup  sup  t-;K(u) ]  <  Bh  *  =  Bn3  \ 
h  x  1  h  u  11 ' 

and  also  by  (5.2) 


(5.19) 


K(^-~ )  -  ^  K )  <  1  h  3-h^  sup  j K ( u)  |  +  h^  1  M]h  3-h?  1  | ' 


£  B(n"i/V  +  n 1 ~  (n  3/>)Y) . 


it  follows  from  the  above  and  (3.1)  that 


-27- 


|N(h)-N(hf) |  Bn-l-26 

sup  -  <  - : -  -*■  u  . 

2,h  MISE(h)  n 


To  check  (5.18)  note  that  by  the  above  method 


|MISE(hf )-MISE(h) |  <  /  E | f (x.h^J-f (x ,h) | • | f (x ,h; )+f (x, h)-2 f (x) j w(x) dx . 


But  by  (1.2)  and  (5.19), 


If  (x.hj-f (x,h)  |  1  Bn 


Thus,  since  E|f(x,h.)|  is  bounded, 


|MISE(hf)-MISE(h) |  Bn-2- 

SUp - 1 -  <  - jr- 

C,h  MISE(h)  n 


This  completes  the  proof  of  Lemma  B2  and  hence  also  that  of  Lemma  B. 
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